«•  .  f 

microcopy  resolution  test  chart 

HAHOML  BUREAU  Of  STANDARDS  !%>  * 


AD- A 180  127 


m  flu 


UJHY 


■  .  •» 


/ 


MIN-MAX  BIAS  ROBUST  M-ESTIMATES  OF  SCALE 


by 


R.  Douglas  Martin 
Ruben  H.  Zamar 


s 


TECHNICAL  REPORT  No.  72 
December  1986 


Department  of  Statistics,  GN-22 
University  of  Washington 
Seattle,  Washington  98195  USA 


DTIC 

electe 

may  1  1 1987 


PD°?d  *" 

hmited  ' 


87 


5  7  04C 


S'iS'W 


mm 


SECURITY  CLASSIFICATION  OF  THIS  PACE  (Wh«»  Dete  Entered) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1.  REPORT  NUMBER 

72 

2.  GOVT  ACCESSION  NO. 

WWJIZEHcA 

4.  TITLE  (end  Subtitle) 

Min-max  Bias  Robust  M-estimates  of  Scale 

S.  TYPE  OF  REPORT  .  PERIOO  COVERED 

TR  12/1/86-11/30/87 

6.  performing  org.  report  number 

7.  author^*; 

R.  Douglas  Martin  &  Ruben  H.  Zamar 

a.  contract  or  grant  number^ 

N00014-84-C-0169 

3.  PERFORMING  ORGANIZATION  NAME  ANO  AOORESS 

Department  of  Statistics,  GN-2 

University  of  Washington 

Seattle,  Washington  98195 

10.  PROGRAM  ELEMENT.  PROJECT.  TASK 

AREA  .  WORK  UNIT  NUMBERS 

NR-61 1-003 

n.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 

0NR  Code  N63374 

1107  NE  45th  St 

Seattle.  WA  98105 

12.  report  oate 

December  1986 

13.  NUMBER  OF  PAGES 

14.  MONITORING  AGENCY  NAME  b  AOORESSf//  different  from  Controlling  Ottlce) 

IS.  SECURITY  CLASS,  (oi  thte  report) 

Unclassified 

ISfl.  OCCLASSIFICATION/  DOWNGRADING 

schedule 

16.  DISTRIBUTION  STATEMENT  (of  thie  Report) 


APPROVED  FOR  PUBLIC  RELEASE:  DISTRIBUTION  UNLIMITED. 


17.  DISTRIBUTION  STATEMENT  (ot  thm  sbetrect  entered  In  Block  20,  1/  dillmront  Irom  Report) 


IS.  SUPPLEMENTARY  NOTES 


19.  KEY  WOROS  (Continue  on  reeeree  tide  it  neceeemry  end  Identity  by  block  number) 

Robustness,  M-estimates  of  Scale,  Min-Max  Bias,  Maximal  Bias, 
Breakdown  Point. 


20.  A?S1  PACT  ( Continue  an  referee  tide  It  neceeemry  end  Identity  by  block  number) 


oo ,  1473 


EDITION  OF  I  NOV  63  13  OBSOLETE 

S/N  0  T  02-  UP*  01  4-  660  I 


SECURITY  CLASSIFICATION  OP  THIS  R  ACE  '"in  O.l.  Mnfr*d) 


MIN-MAX  BIAS  ROBUST  M-ESTIMATES  OF  SCALE 


R.  Douglas  Martin* 

Ruben  H.  Zamar* 

Department  of  Statistics,  GN-22 
University  of  Washington 
Seattle,  Washington  98195  USA 


ABSTRACT 


<-i 

/ 


r-  r-  for,*  r.  i  > 

%  i 


.  € 


Min-max  bias  robust  M-esdmates  of  scale  are  obtained  for  positive  random 


i 


variables  which  have  (-contaminated  distributions.  Any  such 'estimate  is  a  scaled 


order  statistic,  with  the  order  statistic  determined  by  £ .  As  0-5  the  min-max 
bias  robust  estimate  becomes  a  scaled  sample  median,  which  thereby  enjoys  both 
high  breakdown  point  of  0.5  and  min-max  bias  robustness.  Furthermore,  for  a 

wide  range  of  p ,  min-max  estimate  is  quite  close  to  die  scaled  median  in  terms  of 

<\,u  ■» 

both  structure  and  min-max  bias  behavior.  Results  are  also  obtained  for  random 
variables  whose  distribution  is  F  =  (l-p)Fo+p//  with  symmetric  about  an 


unknown  location  parameter.  In  particular  we  show  that  when  F$  is  normal  and 
0  5  £  £  .35 ,  the  min-max  bias  M-estimate  of  scale  is  a  scaled  order  statistic  applied 
to  the  absolute  value  of  centered  data,  with  the  median  as  the  centering  estimate. 
This  estimate  is  extremely  close  to  a  scaled  median  absolute  deviation  about  the 


median,  in  terms  of  both  structure  and  bias  behavior.  — 
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1.  INTRODUCTION 


In  a  somewhat  neglected  section  of  his  by  now  classic  paper,  Huber  (1964)  considered 
the  problem  of  minimizing  the  maximum  asymptotic  bias  of  translation  equivariant  location 
estimates  over  e-contaminated  models  with  asymmetric  contamination.  Huber  showed  that 
the  sample  median  solves  this  problem.  He  also  downplayed  the  significance  of  the  result 
and  concentrated  on  the  “deeper  theory”  of  min-max  variance  robustness  over  symmetric 
neighborhoods  (see  Section  4.3, 4.4  of  Huber,  1981). 

In  spite  of  substantial  awareness  of  the  importance  of  controlling  bias  due  to 
contamination,  and  other  deviations  from  the  nominal  model,  the  literature  is  relatively 
devoid  of  global  results  on  this  robustness  problem.  Hampel’s  optimal  bounded  influence 
estimates  (see  Hampel  et  al.,  1986,  and  references  therein)  control  bias  only  for  vanishingly 
small  fractions  of  contamination,  and  the  shrinking  "Jn  -neighborhood  approach  (see  Jaeckel, 
1971,  Bickel,  1984  and  references  in  the  latter)  is  also  local  in  nature.  The  main  global 
approach  to  robustness  has  been  the  construction  of  high  breakdown  point  estimates  for 
multi-dimensional  problems.  Interesting  recent  results  along  these  lines  may  be  found  in 
Donoho  (1982),  Donoho  and  Huber  (1983),  Rousseeuw  (1984),  Rousseeuw  and  Yohai 
(1984),  Yohai  (1986),  Hampel  et  al.  (1986),  Yohai  and  Zamar  (1986). 

It  should  be  noted  that  the  problem  of  bias  robustness  and  the  desirability  of  optimal 
bias  robust  estimators,  namely  min-max  bias  estimates,  is  clearly  recognized  in  Hampel  et 
al.,  (1986,  Table  2A,  p.176)  where  the  term  “most  B  -robust”  is  used.  We  are  also  aware  of 
very  recent  work  on  bias  robustness  by  Donoho  and  Liu  (1986),  who  demonstrate  attractive 
properties  of  minimum  distance  estimators  in  this  regard. 

In  the  present  paper  we  take  a  small  step  beyond  Huber’s  (1964)  early  results  by 
constructing  min-max  bias  robust  estimates  of  scale  using  M-estimates  and  e-contaminated 
distributions.  An  M-estimate  of  scale  for  positive  random  variables  X, ,  is  defined  as 
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for  an  appropriate  function  %.  We  shall  work  with  monotone  %  which  satisfy  A1-A3 
stated  in  Section  2.1.  The  general  definition  (1.1)  is  needed  to  handle  x  which  are 
discontinuous.  If  X  is  continuous,  then  sn  will  satisfy 


Zx 


l 


=  b 


(i.n 


and  sn  will  be  unique  when  x  is  strictly  increasing  on  { t :  ^  b  } . 

The  positive  random  variables  Xt  are  assumed  to  be  independent  and  identically 
distributed,  with  common  distribution  F  belonging  to  the  e-contaminated  family 

F  =  {F  :  F  =(l-e)F0  +  e//}  (1.2) 

where  H  is  any  distribution  function  and  0  <  6  <  1 ,  e  fixed.  In  a  later  section  of  the  paper, 
this  setup  is  modified  somewhat  to  deal  with  the  case  of  random  variables  X ,  which  have  a 
nominal  distribution  which  is  symmetric  about  an  unknown  location  parameter. 

Estimation  of  scale  is  an  asymmetric  problem  with  regard  to  bias.  Outliers  give  rise  to 
asymptotic  positive  bias  whose  maximum  positive  value  is  denoted  by  B+(x),  whereas 
inliers  result  in  a  negative  bias  whose  maximum  absolute  value  is  denoted  by  B~(x). 
These  biases  are  related  to  the  minimum  and  maximum  asymptotic  values  of  the  scale 
estimate,  *+(x)  and  r~(X)>  by  r+(X)=  1 +5+(X)  and  r~(X)  = 1  -*"(X) . 
respectively. 

In  general  5+(X)*^(X)*  and  this  asymmetry  leads  one  to  introduce  a  loss  function 
/  ( J_(X)*'I+(X)) =  ( (5-(X),5+(x))  to  be  minimized  with  respect  to  the  choice  of  x» 
thereby  resulting  in  a  min-max  bias  estimate.  We  shall  call  any  such  estimate  a  bias-robust 
estimate. 


It  came  as  a  pleasant  surprise  to  find  that  the  maximal  biases  B+(x)  and  B~(x),  or 
equivalently,  the  minimal  and  maximal  scales  s+(x)=  1  +5+(x)  >  ^~(X)=  1  -B~(%), 
can  be  obtained  in  a  manageable  analytic  forms.  It  is  this  manageability  which  allowed  us  to 
minimize  /  (s~(x),s+(X))>  and  thereby  obtain  bias  robust  estimates  for  certain  classes  of 
loss  functions 

Section  2  of  the  paper  presents  the  basic  theory  for  the  contamination  model  (1.2)  for 
positive  random  variables.  In  Section  2.1  we  show  that  it  suffices  to  work  with  point  mass 
contaminations.  In  Section  2.2  we  establish  the  maximal  and  minimal  scale  expressions 
(2. 10) — (2. 1 1)  for  s+(x)y  s~(X)>  which  are  of  value  in  their  own  right  for  calculating  the 
maximal  biases  of  any  scale  M-estimate.  Section  2.3  proves  the  pleasant  result  that  s  +( x ) 
is  minimized  and  s~(x )  is  simultaneously  maximized  by  a  x  °f  0-1  jump  function  type, 
Xa ,  with  jump  location  a  .  The  resulting  estimate  is  a  scaled  order  statistic.  This  result  is 
then  used  in  Section  2.4  to  establish  the  optimal  choice  of  a  for  loss  functions  /  of  a  fairly 
general  form.  In  Section  2.5,  the  breakdown  point  of  the  bias  robust  estimates  is  exhibited. 
It  is  shown  that  as  e-»  0.5 ,  the  min-max  bias  robust  estimate  is  a  scaled  median,  which  has 
breakdown  point  0.5.  Thus  the  scaled  median  enjoys  two  global  robustness  properties 
simultaneously. 

Section  3  provides  some  explicit  bias,  breakdown  point  and  efficiency  calculations  for 
the  cases  of  exponential  and  positive  normal  F0 .  It  turns  out  that  for  a  wide  range  of 
contamination  fractions,  the  bias  robust  estimate  is  quite  close  to  the  scaled  median,  both  in 
form  and  with  regard  to  actual  bias ! 

Section  4  briefly  treats  maximal  bias  curves,  and  also  the  issue  of  the  finite-sample  size 
relevance  of  our  asymptotic  theory. 

Finally,  Section  5  treats  the  case  of  estimating  the  scale  of  distributions  which  are 
symmetric  about  an  unknown  location  parameter.  The  problem  is  now  considerably  more 


complicated,  but  we  nonetheless  show  that  jump-function  type  M-estimates  are  again 
optimal  under  certain  conditions.  For  the  case  of  normal  FQ  we  are  able  to  establish  bias 
optimality  for  the  range  O^e^.35 .  The  bias  optimal  estimate  uses  the  median  for 
centering,  and  is  very  close  to  the  well-known  scaled  (for  Fisher  consistency)  median- 
absolute  deviation  about  the  median  (MADM)  estimate,  both  in  form  and  with  regard  to  bias 
performance.  Thus  the  scaled  MADM  is  very  nearly  optimal  with  regard  to  both  breakdown 
point  and  bias. 

Proofs  are  provided  in  the  body  of  the  paper  for  the  key  results,  whereas  proofs  of  lesser 
results  are  relegated  to  Section  6. 
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2.  THE  GENERAL  SCALE  PROBLEM 

2.1  M-estimate  Scale  Functionals  for  Contamination  Models 
The  asymptotic  value  of  sn ,  defined  by  (1.1),  is 

s  (%,F)  =  inf  •  s  :  j%(±)dF(x)<b  |  ,  Fe  F.  (2.1) 

The  constant  b  will  always  be  determined  by  F0  and  x  as 

b  =  EFox(X)  .  (2.2) 

which  insures  Fisher  consistency:  s  (F0 ,  x )  =  1  • 

We  shall  work  with  the  class  C  of  %  functions  which  satisfy  the  following  conditions: 

(Al)  %  is  nondecreasing  and  bounded  on  [0,<*>),  with  at  most  a  finite  number  of 
discontinuities,  and  x  (0)  =  0. 

(A2)  x  is  such  that  b  -b(x)  satisfies  the  inequalities  e<b  <  1  -e 
In  view  of  (Al)  and  (2.1)— (2.2),  there  is  no  loss  of  generality  in  making  the  assumption 
(A3)  supjXCO  =  L 

Lemma  2.1:  Under  (A1)-(A3)  the  asymptotic  equation  (2.1)  has  a  unique  solution 
s  =  s  ( X » P )  e  (0,  °°)  for  any  F  e  F ,  including  the  case  where  H  is  a  point  mass  8^  at 
+  oo  or  80  at  0. 

REMARK  2.1:  It  is  easy  to  see  that  for  any  unbounded  x»  J(X>^)=  °°»  when  H  - 
Thus,  boundedness  of  x  is  a  necessary  (but  not  sufficient)  condition  to  obtain  a  bounded 
bias  over  F . 


v.v 
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REMARK  22:  The  proof  of  Lemma  2.1  shows  that  if  b  =  1  -  e,  then  (2.1)  will  have  only 
the  solution  s  =  0  when  H  is  concentrated  at  the  origin.  This  constitutes  (implosive) 
breakdown  of  the  estimate  due  to  inliers.  The  proof  also  reveals  that  if  b  =  e  and  H  is  a 
point  mass  at  infinity,  then  s  =  <»  is  the  only  solution  and  we  have  (explosive)  breakdown 
due  to  outliers.  See  Hampel  (1971),  Huber  (1981)  and  Donoho  and  Huber  (1983)  for  various 
definitions  of  breakdown  point. 

Consider  the  family  of  point-mass  contaminations 

Ft  =  {F  :  F  =(l-e)F0+eSy,  y  e  R  +  } 

were  8^,  is  a  point  mass  at  y  and  R+  is  the  set  of  extended  nonnegative  real  numbers. 
The  following  lemma  shows  that  the  maximum  and  minimum  biases  over  F  are  achieved 
over  the  smaller  subfamily  Fi . 

Lemma  22:  Suppose  (A  1)  and  (A2)  hold.  Then 

infFeF,  s  (X,F)  =  inf FeFs(x,F) 
and 

suPFeF,  *  (X>F)  =  sup ff:Fs(x>F) 


2.2  Minimum  and  Maximum  Scale  Functionals 

Since  it  suffices  to  work  with  F  in  Fj ,  it  is  convenient  to  replace  s(x,F)  by  either 
s  ( X  >  y )  >  or  more  simply  sy  ,  where  y  is  the  location  of  the  contamination  point  mass  5  y  . 
Let 

s+(x)  =  supy20s(x,y)  (2.3) 
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denote  the  maximal  scale,  and  let 

s~(X)  =  infy2os(X>y)  •  (2-4) 

denote  the  minimal  scale.  Then  the  maximum  positive  bias  of  s  ( x  >  F )  over  F  is 

B+  =  s+(X)-l  (2.5) 

and  the  maximum  of  the  absolute  values  of  the  negative  bias  is 

B-=l-s~(x).  (2.6) 

Let 


hx(s)  =  jx(~)dF0(x)  ,  0 < s  < °°  (2.7) 

0  s 

and  note  that  for  F  e  ,  and  any  s  >s(x,y )  we  have 

(l-e)M.y)  +  ex(^)-fc  <  0.  (2.8) 

*•  s 

where  b  =  hx(\) .  It  is  easy  to  see  that  h%  is  continuous,  non-negative  and  strictly 
decreasing,  with  limJ_^0  hx(s)  =  1 ,  lim,^,,  h^(s)  =  0.  When  x  is  continuous,  so  that  sy 
solves  (2.8)  with  equality,  the  following  result  is  almost  immediate. 


Theorem  2.1:  Suppose  that  (Al)  and  (A2)  hold.  Then 

^+(X)  =  A£1(4Zr)  (2-9) 

*■  l-e 

and 

^"(X)  =  *x1(T^“)  (2’10) 

K  l-e 

with  J+(X)  achieved  by  point  mass  contamination  Sy  with  y— >00,  and  s~(x)  achieved 
by  point  mass  contamination  80  at  the  origin. 
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2.3  Optimality  of  Jump  Function  %’s 

We  now  suppose  that  F0  has  a  density  /o .  Consider  the  jump  function  type  %  given 
by 


X  (*  )  = 

a 


1  \x  |  >a 

0  \x  \  <a 


(2.11) 


When  %  in  (1.1)  is  a  jump  function  x  ,  the  estimate  sn  is  a  scaled  order  statistic: 


s*  ~  7 *(« -[«*]) 


Fix  be  ( e,  1  -  e) ,  and  let 


=  and  J  x(x)fQ(x)dx  =  ft  } 

o 


(2.12) 


The  following  lemma  shows  that  if  a  is  appropriately  chosen,  then  for  a  large  class  of 

densities  /0  ,  %  minimizes  the  maximal  scale  s  +  =  .r+(X)  over  Cb  ,  and  simultaneously 
a 

maximizes  the  minimal  scale  s  ~  =  s  “(  x  )  1 


Lemma  23:  Let  a  satisfy 


1-F0(a)  =  b 


(2.13) 


fo(s  x) 

and  suppose  that  F0  has  a  positive  density  f0  with  the  property  that  .  is 


fo(x) 


decreasing  in  x  for  s  >  1  and  increasing  in  x  for  s  <  1 .  Then  for  every  s  >  1 


hx  (s)  £  hx(s)  -V  XeCj. 


(2.14) 


and  for  every  s  <  1 


hT  (s)  2  hy(s)  V  X e  C&  . 


(2.15) 
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Proof:  x  is  a  member  of  Cb  by  virtue  of  (2.13).  For  every  x  <=  Cb  we  have 


fx(*)fo(x)dx  =  I  [l~X(x)]fo(x)dx 


Then  for  s  >  1  we  have 


s  jx(x)  f0(sx)dx  =  s  jx(x)fo(x)  ~~~  dx 
0  0  fo(x) 


fo(sa)  a 

>  s  - - jx(x)fo(x)dx 


=  s 


fo(a)  o 
fp(sa) 

fo(a) 


\  [l-X{x)]fo(x)dx 


a 

Then  for  every  %  e  Cb  and  for  every  j>1  we  have 


=  5  J  X(x)fo(xs)dx  2.  s \x  (x)fo(sx)dx  .  =  h  (s) 
o  o  a  a 

For  s  <  1 ,  the  above  inequalities  are  reversed.  D 


Lemma  2. 4:  Suppose  that  f0(x)  is  non-increasing,  differentiable,  and  that  -log/0(;t)  is 
convex.  Then  the  hypotheses  of  Lemma  2.3  are  satisfied. 

REMARK  2.3:  The  exponential  and  positive  normal  distributions  treated  in  Section  3  satisfy 
Lemma  2.4. 


REMARK  2.4:  The  M -estimate  of  scale  based  on  x  is  a  maximum-likelihood  estimate 


(M.L.E.)  of  scale  based  on  a  density  having  the  form 
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0 < JC 

x>a 


where  x>0  satisfies  x3  +x2a2 -a  =  0.  Of  course  bounded  x’s  can  only  arise  from 
M.L.E.’s  for  densities  having  tails  at  least  as  heavy  as  the  Cauchy  tails  of  /*.  While  the 
distribution  F  *  corresponding  to  /  *  is  not  a  member  of  F ,  there  is  no  particular  reason 
that  it  should  be  for  the  problem  we  are  treating. 

Set 


2  =  Fq~  1  (£)  (2.16) 

and 

a  =  ^o_10-e)-  (2.17) 

Then  in  view  of  (2.13)  and  (A2),  the  jump  point  a  of  the  jump  function  x  must  satisfy 

a 

a  <  a  <  a  .  (2.18) 


For  X  =  X  »  let  s+(a)  =  s+(x  )  and  s~(a)  =  .y  (x  ),  and  note  that 
a  a  a 

hx  (s)=  l-F0(sa).  Thus  (2.9)  and  (2.10)  give 

a 


s+(a) 


>o(a)' 

1-e 


(2.19) 


*  (a) 


F0(a)-e 


l-e 


(2.19’) 


Note  that  as  a  T  a  ,  s+(a)  — >  <»  and  breakdown  due  to  outliers  occurs.  On  the  other 
hand,  when  a  i  a  ,  s~(a)  0  and  breakdown  due  to  inliers  occurs.  For  a  e  (a,  a)  we 

clearly  have  0  < s~(a)  £  1  and  l^j+(a)<oo. 
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For  some  distributions  F0  ,  it  may  turn  out  that  ^+(a)  and  s~(a )  are  both  monotone 
increasing  functions  of  a  (this  is  the  case  for  example  when  F0  is  the  exponential 
distribution  treated  in  Section  3).  In  such  cases  we  wish  to  choose  a  small  in  order  to  make 
the  maximal  scale  s+(a)  and  the  maximum  positive  bias  s+(a)-l  small.  At  the  same 
time  one  wishes  to  choose  a  large  to  make  the  minimal  scale  s~(a)  large,  thereby  making 
maximum  negative  bias  1  —  ^  ” («a )  small.  The  precise  choice  of  a  will  depend  upon  the 
loss  function  used  to  assess  bias. 

2.4  Min-Max  Bias  Results  for  Certain  Loss  Functions 

Bias  in  scale  estimation  is  a  fundamentally  asymmetric  problem,  and  one  may  wish  to 
consider  any  of  a  variety  of  loss  functions  /  =  l(s~,  s+) .  Most  loss  functions  of  interest  will 
assign  infinite  loss  to  s+  =  <»  or  s~  =  0 .  Some  possibilities  are,  for  0  <  s~  S  1  :£  s  +  <  <» : 

(i)  l(s~,s+)  =  (0~l  +  s+-2 

(ii)  l(s~,s+)  =  log  s+ -  logs- 

(iii)  /(s-,.0  =  max {(s-)-1, j+} 

(iv)  l(s~,s+)  =  max  {-log s~,  log  j+} 

Since  several  considerations  lead  one  to  the  logarithmic  transformation  when  estimating 
scale,  the  choices  (ii)  and  (iv)  may  be  particularly  appealing. 

The  above  possibilities  suggest  consideration  of  two  distinct  classes  of  loss  functions. 
Lt:  All  continuous  functions  l(ulu2)  on  (0,  l]x[l,«o)  which  are  decreasing  in 
u ! ,  increasing  in  «2,  and  such  that  limM l_>Qt(ul,u2)  =  <*>  f°r  eac^  rea*  u 2 > 

and  limUj_>00 /(«!,«  2)  =  00  for  each  real  u  { . 

All  functions  on  (0,  l]x  [  1,«>)  of  the  form 

/  ( u  j ,  u  2)  =  max  {£i(wj),  £2(^2))  where  and  g2  are  continuous  with 


g  t  strictly  monotone  decreasing,  g  2  strictly  monotone  increasing,  and 


— muv— >o<?  i(“  i)  =  2( «  2)  =  00 • 

While  L2  is  a  special  case  of  Lj,  L2  has  enough  additional  structure  to  warrant 
separate  treatment.  Note  that  for  /  in  L[  or  L2,  l s+(x))  represents  the 
maximum  loss,  accounting  for  both  positive  and  negative  bias,  due  to  use  of  a  particular  %  ■ 
Let 

L(a)  =  l(s-(a),s+(a))  =  /(*"(*  ),  s+(X  )) 

a  a 

denote  the  value  of  the  loss  for  a  jump  function  x  • 

a 


Theorem  2.2:  Suppose  that  /  e  L,- ,  i  =  1,  or  i  -  2.  Then  under  the  assumptions  of 
Lemma  2.3,  for  1  =  1, 2 


exists,  and 


aQ  =  argmin  L{a) 
a  € (a, a  ) 


L(a0)  =  min  /(s'(X)> ^(X))  • 

XeC 

Proof:  Since  F0  is  not  only  absolutely  continuous,  but  also  strictly  increasing,  Fq  1 
exists  and  is  continuous.  Thus  s~(a)  and  s+(a)  are  continuous  functions  of  a  . 
Furthermore,  s~(a)— >0  as  a  la  =Fq1(£)  and  j  +(a )  — > °°  as  a  T a  =  Fq  1  (1  -e) . 
Thus  L(a)— as  (2-la  or  a  To  and  since  L{a)  is  continuous,  its  minimum  a0  is 
attained  on  a  closed  subset  of  {a,  a).  It  follows  from  Lemma  2.3  and  the  properties  of  L 
that  for  each  b  e  (e,  1  - e) ,  and  correspondingly  each  a  e  (a,  a  ),  L(s~(x)>  ■r+(%))  is 
minimized  by  the  jump  function  %  ,  with  a  determined  by  (2.13).  D 
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Corollary  22:  Let  /e  L2,  and  let  aQ  be  the  optimal  a  of  the  theorem. 

(i)  Either  a  0  .satisfies  the  equation 

SiCy'(a0))  =  g2(*+(ao))  •  (2.20) 

or  a  o  is  a  local  minimum  of  at  least  one  of  the  functions  s  ~( a ),  s  +( a ) . 

(ii)  If  s~(a)  and  j+(a)  are  both  strictly  monotone  increasing,  then  aQ  uniquely 
satisfies  (2.20). 

Proof:  (i)  Suppose  that  a0  does  not  satisfy  (2.20).  Then  there  is  a  neighborhood 
( a 0- S, a 0+  5 )  such  that  either  l(a)=  g  l(s~(a))  or  l(a)=  g2(s+(a))  on 
( a 0- 8, a 0+8) .  Then  by  definition  of  aQ  it  must  be  a  local  minimum  of  either  s~(a)  or 
s+(a). 

(ii)  g  ^'(a))  is  a  strictly  decreasing  function  of  a  ,  which  tends  to  +~  as  a  i  a  , 
while  g2(5+(a))  is  a  monotone  increasing  function  of  a  strictly  tends  to  +°°  as  a  T  a  . 
Thus  the  equation  g  ~(a ))  =  g  2(s  +(a ))  has  a  unique  solution  a0e  (a,a).  □ 

2.5  Breakdown  and  Bias  Optimality  of  the  Median 

For  discussions  of  the  breakdown  point  of  scale  estimates,  including  M-estimates,  see 

Hampel  et  al  (1986)  and  Huber  (1981).  Here  we  comment  only  on  an  M-estimate  of  scale 

based  on  jump  functions  %  with  a  6  (a,  a). 

a 

Let  £+  denote  the  supremum  of  all  e  for  which  s  ( x,  y  ) ,  given  by  (2.1)  with  /  e  Fj 

is  finite  when  y  =  °o,  and  let  e"  denote  the  supremum  of  all  e  for  which  this  s  (x,y  )  is 

positive  when  y  =0.  Then  take  the  breakdown  point  (BP)  of  the  estimate  to  be 

BP  =  min  (e-,  E+) .  For  the  case  of  a  jump  function  %  ,  it  follows  from  (2. 19)— (2. 19’)  that 

a 


e"  =  F0(a)  (2.21) 

e+  =  1  -F0(a)  (2.21’) 

and  so 

BP  =  min{F0(a),  1-F0(a)}  . 

Clearly  BP  £0.5  and  BP  =0.5  only  if  F0(a)  =  0.5.  In  this  case  a  =amed  is  the 
median  of  F0 ,  (2.2)  gives  b  =  b  ^  =  0.5 ,  and  the  solution  of  (2.1)  is 

F~\h 

HameJ’F)  =  - (2.23) 

assuming  F_1(y)  exists.  In  words:  the  only  jump  function  type  M-estimate  of  scale  with 
breakdown  point  is  the  median  functional,  standardized  by  the  median  of  F0  to  insure 
Fisher  consistency. 

Now  a  bias  optimal  estimate  of  the  type  given  by  Theorem  2.2  or  Corollary  2.2  is  of 
jump  function  type,  with  jump  location  aQ  in  ( a ,  a  ) .  For  e  <  j  we  will  seldom,  if  ever, 

have  BP  =  -i-  for  the  optimal  a0.  Note  however  that  as  e  T  -i- , 
£  =F0~1  (e)  -»F0-1  (-j)  and  a  =F0~1  ( 1-e)  ->  a0  =  F0~l  (y).  Thus  any  bias 
optimal  estimate  given  by  Theorem  2.2  or  Corollary  2.2  tends  to  the  median  estimate  (2.23) 
as  e  — >  y  .  Thus  the  median,  as  an  estimate  of  scale  of  a  positive  random  variable  with  a 

contamination  distribution,  simultaneously  enjoys  two  global  robustness  properties:  a 
maximal  breakdown  point  of  one-half  and  min-max  bias  optimality. 


3.  THE  EXPONENTIAL  AND  POSITIVE  NORMAL  DISTRIBUTIONS 


In  this  section  the  loss  function  l (s~,s+)  =  max  { log s log s + }  is  used. 

3.1  Exponential  Distribution 

Suppose  that  our  target  distribution  is  exponential  with  scale  parameter  s  ,  that  is 

f(x,s)  =  — /0(— ),  fo(x)  =  e~x  (3.1) 

s  s 

In  this  particular  case  s+(a)  and  s~(a)  (See  Section  2)  reduce  to 


for  a  e  (a ,  a  ) . 


s  ( a ) 


(3.2) 

(3.3) 


We  notice  immediately  that  s~(a)  is  strictly  increasing  with  lim_j"(a)  =  l, 

a  -*a 

lim  5~(a)  =  0.  It  is  also  easy  to  check  that  the  derivative  of  j+(a)  is  positive  (see 

a  —>a 


Section  5  for  details),  so  that  s  +(a )  is  strictly  increasing,  with  lim  s  +( a )  =  «>  and 

a  —*a 

lim  s  "( a )  =  log  (1  -  2e)/log  (1  -  e) .  Therefore  Corollary  2.2  is  in  force. 

a  —>a 


In  Table  la  we  show  values  of  the  optimal  a0 ,  along  with  log  s  +(a 0 )  =  -  log  ~(a0) , 
l-j~(a0)»  and  s+(flo)-l  ^or  severa^  values  of  e.  Breakdown  points  and  efficiences  at 
the  nominal  model  are  also  given.  We  note  that  the  breakdown  points  of  the  optimal 
estimators  are  fairly  high  (at  least  0.46),  even  when  they  are  designed  for  contamination 
models  with  small  fractions  of  contamination ! 

For  the  sake  of  comparison  with  the  optimal  estimator,  Table  lb  displays  corresponding 
values  of  -  log  s  ~ ,  log  s + ,  1—  s~  and  j+-1  for  the  scaled  median  estimator 


Table  1  Min-Max  and  Median  Estimates  for  Exponential  Distribution 

(a)  Min-Max  Estimate 


Breakdown 


£ 

a0 

logr+(a0) 

l-r~(a0) 

i 

o 

Point 

Efficiency 

0.10 

0.716 

0.159 

0.147 

0.173 

0.49 

0.49 

0.20 

0.739 

0.359 

0.302 

0.432 

0.48 

0.50 

0.30 

0.761 

0.632 

0.469 

0.882 

0.47 

0.51 

0.40 

0.773 

1.080 

0.661 

1.946 

0.46 

0.51 

0.45 

0.763 

1.530 

0.783 

3.618 

0.47 

0.48 

(b)  Scaled  Median  Estimate  ( a  = 

.693) 

£ 

~  log  s  ~ 

logs* 

1  -s~ 

s+-l 

Breakdown 

Point 

Efficiency 

0.10 

0.157 

0.152 

0.50 

0.48 

0.20 

0.347 

0.322 

0.415 

0.50 

0.48 

0.30 

0.592 

0.515 

0.50 

0.48 

0.40 

1.335 

0.950 

0.737 

1.585 

0.50 

0.48 

0.45 

1.984 

1.241 

0.862 

2.459 

0.50 

0.48 

sn  =  med/  . 693.  While  the  scaled  median  is  actually  less  sensitive  to  outliers  than  the 
optimal  estimate,  it  is  sufficiently  more  sensitive  to  inliers  that  it  is  somewhat  dominated  by 
the  optimal  estimate  using  the  loss  max  {-log s~,  logs*} .  As  one  might  expect,  both 
estimates  are  more  sensitive  to  outliers  than  inliers  on  the  raw  scale.  Note  also  that  the 
asymptotic  efficiency  of  .48  for  the  scaled  median  at  the  nominal  model  is  only  slightly 
smaller  than  that  of  the  min-max  bias  estimates  displayed  in  Table  la.  In  summary  we  can 
state:  the  scaled  median  is  a  nearly  optimal  estimate  of  scale  in  terms  of  asymptotic  bias. 

Table  2  displays  similar  results  for  the  positive  normal  distribution,  whose  density  is 
/ pn(x )  =  24>(x),  x  £0,  where  4>(*)  is  the  standard  normal  density.  The  results  are  rather 
similar  to  those  for  the  exponential  distribution.  Again:  the  scaled  median  (in  this  case 
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med! .675 1 )  is  a  rather  good  approximation  to  the  optimal  bias-robust  estimate. 


Table  2  Min-Max  and  Median  Estimates  for  Positive  Normal  Distribution 

(a)  Min-Max  Estimate 


e 

«0 

log  5+(a0) 

1 

fc* 

1 

/— N 

fc 

© 

*— * 

1 

O 

Breakdown 

Point 

Efficiency 

0.10 

0.700 

0.127 

0.120 

0.136 

0.48 

0.38 

0.20 

0.726 

0.284 

0.247 

0.329 

0.47 

0.40 

0.30 

0.750 

0.494 

0.390 

0.639 

0.45 

0.41 

0.40 

0.762 

0.844 

0.570 

1.325 

0.45 

0.42 

0.45 

0.746 

1.236 

0.710 

2.441 

0.46 

0.41 

(b)  Scaled  Median  Estimate  ( a  =  .675) 


s+-l 


Breakdown 

Point 


Efficiency 


0.37 

0.37 

0.37 

0.37 

0.37 


i  iw* 
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4.  MAXIMAL  BIAS  CURVES  AND  FINITE  SAMPLE  SIZE  RELEVANCE 


It  should  be  noted  that,  min-max  bias  optimality  results  aside,  the  calculation  of 
maximal  bias  curves  (i.e.,  plots  of  maximal  bias  of  a  given  estimate  for  each  fraction  e  of 
contamination,  versus  e )  for  contamination  models  for  any  proposed  robust  estimate  is  a 
very  worthwhile  goal  (see  Sec.  2.7  of  Hampel  et  al,  1986).  It  was  a  pleasant  surprise  to 
discover  that  for  a  wide  range  of  estimation  problems  (in  addition  to  estimation  of  scale),  and 
types  of  estimate,  one  can  usually  obtain  an  analytic  expression  from  which  one  can 
calculate  the  maximal  bias  curves. 


For  example,  (2.9)  and  (2.10)  of  Theorem  2.1  lead  to  the  maximal  biases  (2.5)  and 
(2.6),  for  outliers  and  inliers,  respectively.  It  is  not  always  possible  to  express  the  inverse 
h  ~ 1  in  explicit  analytic  form  (e.g.,  it  is  possible  for  the  case  of  exponential  F0  ,  but  not  for 
the  positive  normal  F0  in  Section  3).  Nonetheless,  the  expression  (2.7)  for  h%(s)  is  nice 
enough  (as  are  analogous  expressions  for  other  problems  such  as  location,  regression,  etc.), 
that  one  can  always  solve  numerically  for  s+( x)  and  s~(x)  by  computing  hx(s)  for  a 


grid  of  s  values,  and  interpolating  to  match  with 


b  -e 

1-E 


and 


b 

1  -e  ’ 


respectively. 


We  have  done  this  for  the  following  two  estimators  in  the  case  of  the  positive  normal 
F0 :  (i)  The  median;  and  (ii)  Huber’s  M-estimate  of  scale  based  on 

r2,  0 


X(0  =  ¥w,t(  0  =  i 


t  >  x 


The  results  are  shown  in  Figure  1  for  x=  1.5 ,  which  results  in  approximately  50%  efficiency 
for  the  Huber  estimate,  at  the  nominal  positive  normal  distribution  (where  the  scaled  median 
has  37%  efficiency).  The  Huber  estimate  is  much  more  sensitive  to  the  outliers  than  the 
median,  as  reflected  in  the  fact  that  B ^  >  B^d  for  all  e.  Furthermore,  the  Huber 
estimate  has  a  breakdown  point  toward  outliers  of  only  .29,  (whereas  that  of  the  scaled 
median  is  .5 ).  On  the  other  hand  the  Huber  estimate  is  less  sensitive  than  the  median  to 
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inliers,  and  accordingly  for  all  e .  Also,  the  breakdown  point  of  the  Huber 

estimate  to  inliers  is  .71  (as  compared  with  .5  for  the  scaled  median). 

Finally,  one  should  be  concerned  about  the  relevance  of  an  asymptotic  min-max  bias 
theory  for  finite  sample  situations.  Here  one  is  naturally  interested  in  the  sample  size  n  (e) 
at  which  squared  bias  is  equal  to  the  variance.  One  can  get  a  rough  idea  of  what  n  (e)  will 
be  by  acting  as  if  the  asymptotic  bias  and  variance  projects  well  back  to  finite  sample  sizes. 

For  example,  consider  the  positive  bias  B  +(e)  =  s  +(e)  - 1  of  the  mix-max  scale 
estimate  for  the  exponential  F0  (see  Section  3).  This  bias  is  achieved  by  point  mass 
contamination  8^  at  infinity.  The  corresponding  asymptotic  variance  is  easily  shown  to  be 

+  _  b  (qq)[1~6  (go)] 

( l-e)2aQ  e~2s  a° 

1 

where  j+=  s+(e),  a0  =  a0(e).  Some  values  of  n+(e)=  [  V+(e)]2/F+(e)  are  shown  in 
Table  3. 


! 

Table  3:  Sample  Size  n(e)=  [  V+(e)]2/i?+(e) 
for  Matching  Variance  and  Squared  Bias 
(Min-Max  Scale  Estimate  for  Exponential  F0 ) 


e 

.02 

.05 

.1 

.15 

.2 

n+(e) 

50 

20 

10 

7 

6 

Evidently,  the  asymptotic  theory  is  relevant  down  to  quite  small  sample  sizes  unless  the 
fraction  of  contamination  is  exceedingly  small.  More  precise  finite-sample  size  calculations 
should  be  carried  out  to  confirm  this. 


5.  SCALE  FOR  SYMMETRIC  DISTRIBUTIONS  WITH  NUISANCE  LOCATION 


PARAMETER 


Suppose  now  that  our  observations  Xt  are  not  restricted  to  being  positive,  and  have  a 


location- scale  family  distribution  =  F 


x-u. 


We  now  assume  that 


F=  {F :  F  =  (1-e)Fq  +  eH ,  F0  symmetric  } 


(5.0) 


and  H  is  again  any  distribution  function.  One  now  has  to  estimate  the  nuisance  location 
parameter  p,  and  so  let  p„  be  any  location  and  scale  equivariant  estimate  of  location. 
Then  the  M-estimate  of  scale  sn  is  given  by 


sn  =  inf  { 


s:%X 

l 


u,-pn  | 


<  b 


(5.1) 


where  x  is  as  in  Section  2.  The  corresponding  asymptotic  functional  is 


s(X,F)  =  inf 


f  j: 


I  X  —  p(F)  | 


dF{x)  <,  b 


(5.2) 


Here  p„  is  assumed  to  have  an  asymptotic  functional  representation  p(F),  and  b  is 
given  by  (2.2).  Existence  of  a  unique  solution  s(x,F)  e  (0,  <*>)  follows  under  (A1)-(A3) 
and  finiteness  of  p  ( F ) ,  just  as  in  Lemma  2.1.  Without  loss  of  generality  we  take  p(F0 )  =  0 
and  j(x,F0)  =  1.  Then  p(F)  is  the  bias  in  estimating  p  and  s(x,F)-l  is  the  bias  in 
estimating  s  . 


Let  X  be  a  random  variable  with  distribution  Fx,  and  let  F_x  denote  the 
distribution  of  -X  .  Then  most  p  (F)  of  interest  will  satisfy  the  symmetry  property 

(LI)  H(F_x)  =  -p(F^). 
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Suppose  that  s  is  given  by  (5.2),  with  x  ~FX  resulting  in  n(Fx)<0.  Then  in 

view  of  (5.0)  and  (LI),  there  exists  F_x  in  F  with  jj.  (F_x  )  =  -  (J.  (Fx )  >  0  and 
s  ( % ,  F_x )  =  s  ( x ,  Fx) .  Thus,  it  suffices  under  LI  to  restrict  attention  to  the  sub-family 

F+  =  {Fe  F:  |i(F)>  0}  . 

In  addition  we  will  use  the  assumption 

(L2)  For  all  Fe  F+,  fi(F)  <  |i<  «>  where  |i  =  p.((  1  -e)F0  +  e5,J . 


REMARK 5.1:  Any  location  functional  p.(F)  with  breakdown  point  greater  than  e.such 
that  (ifS,*)  =  supfe  f>|x(F)  will  satisfy  (L2).  For  example  location  M-estimates  based  on 
bounded,  monotone  psi-functions  will  satisfy  (L2). 

Let 


h(s,t)  =  j_;x 


'  \x-t  | 
s 


f0(x)dx  . 


By  change  of  variable  we  have 

h(s,t)  =  s  JQ  X(x)  {fo(xs  -t)+f0(xs  +0}  dx  . 


(5.3) 


(5.4) 


Lemma  5.1:  Let  x  satisfy  (Al).  In  addition  suppose  that  the  density  /0  is  not  only 
symmetric,  but  also  decreasing  on  [  0,  <») .  Then  for  each  fixed  $>0,  A  ( s ,  r )  is 
increasing  in  t  on  [  0,  <») . 


Let 


Ms.F')  =  (l-e)/»(j,n(F))+  ejx 


dH(x ) 


(5.5) 


and  let  ^(X)  and  j^(X)  denote  the  largest  and  smallest  values  of  j(x,F),  as  F 
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ranges  over  F+.  Here  the  subscript  |i  denotes  dependence  on  the  form  of  the  location 
functional  y.=  (!(•)•  X^(.y,F)  is  a  decreasing  function  of  s  ,  and  so  s  *(%)  is  achieved 
by  a  possibly  substochastic  contamination  distribution  H  which  makes  X  ( s ,  F )  as  large 
as  possible  for  s  >  1 ,  and  s  “(x)  is  achieved  by  an  H  which  makes  (s ,  F)  as  small 
as  possible  for  s  <  1 .  Even  though  |i(F)  is  now  involved,  it  follows  from  Lemma  5.1  that 
point  mass  contamination  at  infinity  achieves  s  +  (x) ,  and  point  mass  contamination  at  the 
origin  achieves  s+(x). 

-i 

Let  h-  (r)  denote  the  inverse  of  h  (s,p.)  with  respect  to  the  first  argument,  with 
the  second  argument  fixed  at  p.  =  p.((  1  -e)F0  +eSM) ,  and  define  hQl  (t)  similarly  in 
terms  of  h(s,0). 

Theorem  5.1:  Suppose  that  (A1)-(A3)  and  (L1)-(L2)  hold.  Then 

-  . 

jjj  (5.7) 

»  J 

'  > 

*iT(X)  =  Ao1  (5.8) 

*  - 

REMARK  5.2:  The  maximal  scale  s+  (%)  depends  on  the  location  functional  (i=  p.(  • ) 

through  jj. .  On  the  other  hand  the  minimal  scale  s~(x)  independent  of  the  choice  of  the 
location  functional  n  ( F )  which  is  why  we  denote  it  simply  *“(%). 

Since  the  median  minimizes  the  maximum  bias  p.  of  any  equivariant  location  estimate 
over  F*  (Huber,  1964),  Lemma  5.1  gives  the  following  important  corollary. 

Corollary  5.1:  The  median  minimizes  j*(x)  over  all  location  functionals  n=n(') 
which  satisfy  LI  and  L2. 


The  results  of  Section  2.4  hinge  on  the  key  Lemma  2.3,  the  proof  of  which  relies  on  the 
hypothesis  that  ks(x)  =  fo(sx  )/fQ(x)  is  decreasing  for  s>  1,  and  increasing  for 
s  <  1 .  In  the  present  context  we  still  have  ks (x )  =  /0  ( sx )/f0 (x )  for  .y  <  1 ,  but  for  s  >  1 
ks(x)  is  replaced  by 


/  OL 


O'-' 


*  me  a 


/o(*) 


(5.9) 


where  is  now  the  maximal  bias  of  the  median.  Unfortunately,  k  -  (x)  is  not 

S '  ^  mtd 

decreasing  for  all  s  and  all  values  of  p,^,  and  this  complicates  matters  somewhat. 
However,  we  can  proceed  as  follows. 


First,  note  that  the  same  type  of  argument  used  in  the  proof  of  Theorem  2.2  shows  that 

there  is  an  optimal  jump  function  x  among  the  class  of  all  jump  functions  x  •  That  is, 

a*  a 

for  /  €  L, ,  i  =  1  or  i  =  2 ,  there  is  an  a  m  e  ( 0,  «=)  SUch  that 


Ham)Zl(s~(x  ),  *  +  (x  )),  a  e  ( 0, °°)  (5.10) 

a  a 

Let  5^  =  ^‘t‘(a:t[),5"  =  5_(ajti),  and  set 

_  /o(^;-^)+/0(*c+iw<*)  _ 

K e  t-* >  ~  r  ,  ,  (511) 

fo(x) 

where  the  subscript  now  indicates  the  dependence  on  e  through  j  *  and  each  of 

which  depend  on  e.  Note  that  the  conclusion  of  Lemma  2.3  holds  when  kz(x)  is 
decreasing  (just  replace  fo(sx)/fQ(x)  by  ke(x)  in  the  proof).  In  applications  there  will 
always  be  an  interval  [0,  e0)  for  which  kz(x)  is  decreasing  for  ee  [  0,  e0).  The  normal 
distribution  provides  a  concrete  example  of  this,  as  we  see  shortly. 


The  next  theorem  gives  conditions  under  which  the  optimality  x  over  the  class  of 


all  jump  functions  extends  to  C . 
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Theorem5.2:  Let  a  *  be  as  in  (5.10),  so  that  x  is  optimal  over  all  jump  functions. 

a  * 

Suppose  that  kE(x)  is  decreasing  and  let  /  e  L2.  If  s+(a),  s~(a)  are  both  strictly 
increasing,  or  if  the  derivatives  of  s  +( a ) ,  s  ~(a )  are  non-vanishing  at  a  m ,  then 

L(a)=  min  /  (s_(x),s+(x))  . 
xe  c 

Proof:  Choose  any  C,  and  fix  b  corresponding  to  this  %,  namely  b=EFox(X). 

Let  x  t>e  the  unique  jump  function  which  corresponds  to  this  particular  b  .  Let 

a 

h%(s+)  =  for  the  given  %.  If  hx(s+)>  ,  then  from  (5.7)  for  both 

X  and  x  ,  and  the  monotonicity  of  h( ',\Lmed),X  is  not  better  than  %  ,  i.e., 

a*  a * 

b  “  p 

s+(x)2  s+(x  )-  So,  suppose  that  h%(s+)<  ^  ,  i.e.,  suppose  x  is  strictly  better 

than  x  for  ^+,i.e.,  s+(x)<  =  s+Ca*).  We  will  show  that  then  %  must  be  worse 

a* 

than  x  for  s~,  and  hence  worse  than  x  for  s~.  Since  kE(x)  is  decreasing,  the 
a  a* 

conclusion  of  Lemma  2.3  applies  at  the  point  s  =  s  * .  Thus,  the  jump  function  x  is 

a 

optimal  at  the  point  s  =  s  + ,  and  so  s+(a)<,  s  +( x )  <  s  * .  In  view  of  Corollary  2.2,  and 

the  optimality  of  %  we  have 
** 

£i(**)=  max{gl(s~(a)),g2(s+(a))}  ,  a  e  (0,H 

and  it  follows  that  s~(a)<,  s  ~  .  For  the  particular  b  in  question  the  jump  function  x  is 

a 

optimal  for  all  s  <  1 ,  and  so  we  have 

s~(x)  $  s~(a)  <,  s~  . 

Thus  x  is  as  good  as  any  %  s  C .  □ 
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The  Normal  Distribution 

We  now  apply  this  general  result  to  the  case  of  the  normal  distribution  and  the  loss 

function  l (s~,s+)=  max  { -log  s~,  log  s*} . 

The  optimal  value  of  a  is  obtained  for  each  e  by  minimizing 

max  {-log  j"(tf),log5+(a)}  for  ae(a,a),  where  s+(a)  is  given  by  (5.7)  with 

X  =  X  and  p=p  ^  ,  and  s-(a)  is  given  by  (5.8).  This  cannot  be  done  analytically, 
a 

since  h  ( • ,  t )  given  by  (5.4)  does  not  admit  a  closed  form  inverse  when  /0  is  normal.  We 
determined  aff  =  aif{z)  by  numerical  search  for  e=  .05,  ,5(.05),  evaluated  the 
corresponding  values  s+(a *),  ^“(a^)  and  | =VLmed(£)-  These  are  displayed  in 
Table  5.1a  for  e  =  .05,  ,35(  .05) .  With  the  help  of  Lemma  6.1  (see  end  of  Section  6)  we 
have  numerically  verified  that  the  conditions  of  Theorem  5.2  hold,  and  therefore  x  is 

optimal  for  these  value  of  e  . 

Based  on  further  numerical  checks,  we  conjecture  that  x  is  optimal  for  all 

ee  (0,  .35).  On  the  other  hand  a  calculation  showed  that  kt(x)  is  not  monotone  for 
£=  .4,  and  thus  we  cannot  conclude  that  x  is  optimal  for  all  values  of  ee  (0,  .5),  at 

least  not  by  the  current  method  of  proof.  The  question  of  optimality  of  x  for  the  full 

a * 

range  of  e  is  (0,  .5)  remains  an  open  problem. 

The  min-max  bias  values  log  s  +( a  m )  =  -  log  s  ~( a  # )  are  shown  in  Table  4a,  as  are  the 
negative  and  positive  biases  l-j-(a  +  )  and  j+(a%)  — 1  on  the  raw  scale.  Breakdown 
points  and  efficiences  are  also  displayed. 

As  in  the  case  of  the  solutions  for  exponential  and  positive  normal  distributions  in 
Section  3,  we  find  that  a  m  is  quite  insensitive  to  the  value  of  e ,  and  not  too  different  from 
those  for  the  positive  normal  distribution.  However,  the  actual  min-max  biases  log  s  +( a  m ) 
(and  also  the  corresponding  row  biases)  are  substantially  higher,  as  one  might  expect,  due  to 
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the  nontrivial  bias  of  the  median  as  centering  functional. 

Table  4b  displays  similar  quantities  for  the  scaled  median-absolute  deviation  about  the 
median  (MADM)  estimate.  The  comparison  of  Tables  4a  and  4b  shows  that  even  though  the 
scaled  MADM  is  not  optimal  for  the  given  loss  function  and  values  of  e ,  it  is  strikingly 
close  to  optimal  for  the  range  of  e  values  considered  (the  performance  of  the  scaled  MADM 
is  much  nearer  to  that  of  the  optimal  estimate  than  was  that  of  the  scaled  median  for  the 
positive  normal  distribution  treated  in  Section  3.2).  Except  for  e=  .35 ,  the  common  entries 
of  Tables  4a  and  4b  agree  to  at  least  two  decimal  places,  and  at  e=  .35  the  differences  are 
only  about  .01 .  The  breakdown  point  of  the  min- max  estimates  round  to  .500  for  all  e 
values  shown.  In  effect  a  single  Table  would  suffice,  and  we  are  led  to  assert:  For  all 
practical  purposes  the  scaled  MADM  is  the  min-max  bias  estimate  for  e  =  .05,  .35  ( .05) , 
(and  probably  for  all  e  e  (  0,  .35 ) ). 
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Table  4  Normal  Distribution  with  Location  Unknown 


(a)  Min-Max  Estimate 


e 

V-  med 

** 

log  s+(aitl) 

l-s  (a*)  sA 

h(a*)-l 

Breakdown 

Point 

EFF 

0.05 

0.066 

0.674 

0.063 

0.061 

0.065 

0.50 

0.37 

0.10 

0.139 

0.673 

0.135 

0.126 

0.145 

0.50 

0.37 

0.15 

0.223 

0.673 

0.221 

0.198 

0.247 

0.50 

0.37 

0.20 

0.318 

0.673 

0.324 

0.276 

0.382 

0.50 

0.37 

0.25 

0.430 

0.673 

0.450 

0.362 

0.568 

0.50 

0.37 

0.30 

0.566 

0.676 

0.609 

0.456 

0.839 

0.50 

0.37 

0.35 

0.736 

0.682 

0.812 

0.556 

1.252 

0.50 

0.37 

(b)  Scaled  MADM  Estimate  ( 

a  =  .675) 

e 

-log 

log  s+ 

1-5 

5  +  — 1 

Breakdown 

Point 

Efficiency 

0.05 

0.063 

0.063 

0.061 

0.065 

0.50 

0.37 

0.10 

0.135 

0.135 

0.126 

0.145 

0.50 

0.37 

0.15 

0.220 

0.221 

0.197 

0.247 

0.50 

0.37 

0.20 

0.322 

0.324 

0.276 

0.383 

0.50 

0.37 

0.25 

0.449 

0.450 

0.362 

0.569 

0.50 

0.37 

0.30 

0.612 

0.608 

0.458 

0.838 

0.50 

0.37 

0.35 

0.833 

0.808 

0.565 

1.243 

0.50 

0.37 
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6.  PROOFS  OF  RESULTS 

Proof  of  Lemma  2.1 
Proof:  For  F  e  F ,  we  have 

Ms)  =  \x(j)dF(x)=  (i-e)jx(^)dF0(x)  +  ejx(j)dH(x) 

Since  for  any  distribution  function  H  ,  including  the  point  mass  at  infinity,  we  have 
,  it  suffices  to  show  that  lim^^j. X(.s )  > b  .  Since 

oo 

Ms)>(  l-e)  j x(  —)dF0(x)  for  any  H  ,  with  equality  when  H  is  concentrated  at  the 
o  s 

origin, 

oo 

limJ_»0*X(5)  2>  (l-eJlim^Q.  J%(-)dF0(x)  =  (l-e)>fc  . 

o  s 

Thus  s(F,x)  e  (0,«>)  for  all//  .  □ 


Proof  of  Lemma  2.2 

Proof:  We’ll  show  that  for  all  FeF  there  exist  Fl,F2,  e  Fr  such  that 
s  (X,  F)£s  (x,  F2).  Let  X  e  C  and  $  =  j  (x,F  )  be  defined  as  in  (2.1),  with 

F  =(l-e)F0+e// . 

^  Eh  {  X(^/0  }  =  1  for  some  s  <s  then  X(x/S  )  =  1  as.  [H  ]  and  there  exists  y  e  R 
such  that  X(y/s  )  =  l  •  Let  F2  -  ( 1  - G  ) F0  +  e 5y  ;  for  all  s'  £  s  we  have 

£*{x(x/*)}=  d-E)Mf)+«=  ef{x(x/s-)}  >  b, 

therefore  s  (F2,X)£j.  If  EH  {x(X/s) }  <  1  for  some  0</  <s  (and  then  for  all 
s  £s'  £s  ),  let  y  e  R  be  such  that  X(y/S  )  >  EH  {%(X//) }  and  let  F2  be  as  defined 
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above; 

for  some  s  <  s'  <s  ,  by  continuity  of  h%  we  have 
£f2{ X(X/S)}  =  (l-e)fcx(j)  +  exC y/s)  >  (1  -e)h%(s')  +  eEh{x(X/s')}  >  b 
Therefore  s  ( x  ,F2)  ^  s  . 

On  the  other  hand,  let  F ^  =  ( 1  - e )  F0  +  e  50.  For  all  s  >  s  we  have 

EFi{x(X/s)}  =  (1  -e)hx(s)  <S  (1  -e)h%(s)  +  eEh{x(X/s)}  =  EF{X(X/s)}  <  b, 

hence  ^  (jc  ,F!)  =  inf  {/:£Fi{x(^A')}}  S  s  •  □ 


(6.1) 


(6.2) 


r  v  v 


*i«  v 
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h~\ 


■f~)  S  Sy 
1-e  y 


l-e 


For  y  =  0 ,  the  left-hand  inequality  becomes  an  equality.  Now  note  that  sy  satisfies  the 
monotonicity  property  syi>syi  when  yx>y2.  Thus  if  yn  is  a  sequence  of  values  tending 
to  infinity,  the  limit  7=Umn_t,msy  exists.  We  claim  that  for  sy  =  s,  the  right  hand 
inequality  in  (2.1 1)  becomes  an  equality.  For  suppose 


s  <  h~l( 


b  -e 
l-e 


)• 


Then 


h(s)  > 


b-E 

l-e 


and  we  can  choose  5>0  and  y0  sufficiently  large  that 

>0 


(l-e)A(i  )  +  ex 
Contradicting  sy  £  7 .  □ 


s 

V  J 


-  b  z  ( l-e)  A (7 )  +  e(  1-8)  -  b  >  0 


Proof  of  Lemma  2.4 


Let  s  >  1 .  For  all  x  £  0  we  have 


d_  fo(sx) 
dx  fo(x) 

V  V 


£  0 


if  and  only  if 


s 


_  fo  (sx  ) 
fo(sx) 


<, 


fo  (*) 
fo(x) 


The  last  inequality  follows  since  -  fo  (x)/f0(x)  is  non-negative  and  increasing.  For  s  <  1 


all  the  reverse  inequalities  hold.  O 


Proof  that  —  r+(o)£0 
da 


~~s+(a)  = 
da 


a  e 


e~a-e 


+  log  (^) 
1  -e 


la 2  £  0 


if  and  only  if 


log  ( a  - e)  £  y (a)  =  - +  log(e  a-£) 

<?  -£ 

The  last  inequality  follows  because  y(0)  =  log  ( 1  -  £)  and 


—  Y(o)  =  a  — 
da 


e  a-e 


-<2 


e  a  — £ 


-1 


£  0 


Proof  of  Lemma  5.1 

Proof:  It  suffices  to  consider  the  case  s  =  1 .  Extend  the  domain  of  %  to  R  by  setting 

1 1  + 1 2 

X(-r)  =  X(O>*^0-  Set  A  =  — - — ,0£  tx<,t 2<°°.  Then 

Ll  [X(x-'2)-X(x-'i)]fo(x)<ix 

=  -‘2)-X(x-ti)]fo(x)dx  +  J“[x(*-'2)"X(*-'i)]/o(*)<k 

=  fA"[x(^i-x)-x(^2-x)]/o(fi+f2-x)^)’  +  l~ [x(y -'2)~x(y  -‘i)]fo(y)<ty 
=  |~[x(x-^2)-x(x -»i)l  [/o(x)-/o(x-2A)]^y 
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Proof  of  Theorem  5.1 

Proof:  By  (LI)  and  the  Lemma  5.1  we  have  for  all  s  >  0 

h(s,n(F))  Z  A(5.S)  -v  Fe  r 

and  so 

<  ( 1  -e)h  (s,p)  +  e 

with  the  right-hand  side  achieved  when  H  =  the  point  mass  at  infinity.  Equating  the 
right-hand  side  to  b  yields  5.7.  On  the  other  hand  for  all  s  >  0  we  also  have 

X(s,F)  2:  (l~E)h  (s,0)  +  Fe  F+ 

with  the  right-hand  side  achieved  when  H  -  50,  the  point  mass  at  the  origin.  Now,  equating 
the  right-hand  side  to  b  gives  (5.8).  □ 

Statement  and  Proof  of  Lemma  6.1 

The  following  notation  is  needed  for  Lemma  6.1,  below: 

Let  t  =  su  and  set 

A  (s  ,t  ,b  x,b2)  =  (s2-l)[l+e  2tl>1-2 tb2f  t2e  U>x 
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A4(s,t)  =  A  (s,r,  -j,  1) 

4 


Lemma  6.1:  Suppose  that 


s  (s-u)  >  1 


and 


min  A:(s,t)  >  0 

05; 54  1 


Then 


h(x.s.u)  =  HB'+V+tl**-*) 

♦  <*> 


is  a  decreasing  function  of  x  . 


Proof: 


h  (x,s,u) 


-•r((*2-l)Jt2+«2) 

e  2 


(e““ 


+  e-«* 


) 


and 


3  -i((,*-i)+«n 

—  h(x,s,u)  =  e  2  [-*U2-l)(e 


-«“+ew)- 


su  e  au  +  su 


Then 


-^h(x,x,u)£0  «*-*►  su  £  x  (s2-  1)(  \+e~2sux)  +  su  e~2sux  (6.1) 

The  last  inequality  holds  for  x  £  1  since,  by  hypothesis 

x  (s2-  1)(  l+e-2,“*)  +  su  e~2sux  £  ( s2~  1)£  su  . 

If  x  =  0  then  equality  holds.  Therefore,  we  only  need  to  prove  that  right-hand  side  of  (6.1) 
holds  when  0<  x  <  1 .  Set  8(x)  =  Jc(j2-l)(l+e-2“)+fe-2oc; 
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8'(.x)  =  (.s2-  1)[  1+  e  2tx-  x  2 1  e  2“]  -  2r2e  2tx 
8'(jc)'  >  min  A:  >  0 

OZj £4  J 

and  the  lemma  follows.  □ 


,v 
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